The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity
نویسندگان
چکیده
Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of in silico tools have been employed for the task of pathogenicity prediction, including PolyPhen-2, SIFT, FatHMM, MutationTaster-2, MutationAssessor, Combined Annotation Dependent Depletion, LRT, phyloP, and GERP++, as well as optimized methods of combining tool scores, such as Condel and Logit. Due to the wealth of these methods, an important practical question to answer is which of these tools generalize best, that is, correctly predict the pathogenic character of new variants. We here demonstrate in a study of 10 tools on five datasets that such a comparative evaluation of these tools is hindered by two types of circularity: they arise due to (1) the same variants or (2) different variants from the same protein occurring both in the datasets used for training and for evaluation of these tools, which may lead to overly optimistic results. We show that comparative evaluations of predictors that do not address these types of circularity may erroneously conclude that circularity confounded tools are most accurate among all tools, and may even outperform optimized combinations of tools.
منابع مشابه
Analysis of Missense Mutations of CX3CR1 Gene in Patients with Recurrent Pregnancy Loss Using Bioinformatics Tools
Introduction: Abortion is a common complication that refers to the early termination of pregnancy with the death of the fetus before the 20th week of pregnancy. Previous studies show that many genes are involved in this disease, including the CX3CR1 gene, which is one of the inflammatory response genes in the immune system. The pathogenicity of these variants was determined in this study using ...
متن کاملComputational approach towards identification of pathogenic missense mutations in AMELX gene and their possible association with amelogenesis imperfecta
Amelogenin gene (AMEL-X) encodes an enamel protein called amelogenin, which plays a vital role in tooth development. Any mutations in this gene or the associated pathway lead to developmental abnormalities of the tooth. The present study aims to analyze functional missense mutations in AMEL-X genes and derive an association with amelogenesis imperfecta. The information on miss...
متن کاملIn-silico study to identify the pathogenic single nucleotide polymorphisms in the coding region of CDKN2A gene
Background: CDKN2A, encoding two important tumor suppressor proteins p16 and p14, is a tumor suppressor gene. Mutations in this gene and subsequently the defect in p16 and p14 proteins lead to the downregulation of RB1/p53 and cancer malignancy. To identify the structural and functional effects of mutations, various powerful bioinformatics tools are available. The aim of this study is the ident...
متن کاملP-125: Identification of Novel Missense Mutations of The TGFBR3 Gene in Chinese Women with Premature Ovarian Failure
Background The aim of this study was to assess the ssociation between human transforming growth factor b receptor,type III (TGFBR3) and idiopathic premature ovarian failure (POF) in a Chinese population. MaterialsAndMethods A total of 112 Chinese women with idiopathic POF and 110 normal controls were examined. DNA samples prepared from blood leukocytes were used as templates for polymerase-chai...
متن کاملA comprehensive in silico analysis of pathogenic nsSNPs in the NT5C2 gene involved in relapsed ALL
Background: About 10-20% of children suffering from acute lymphoblastic leukemia (ALL), experience a relapse, which is a major cause of their death. Purine nucleotide analogs are frequently prescribed to maintain the treatment of ALL. Cytosolic 5´-nucleotidase (NT5C2) catalyzes the 5´ dephosphorylation of purine analogs. Gain-of-function mutations in the NT5C2 gene result in resistance to the t...
متن کامل